
* Getting school decile from 'moe_clean.enrolment' file to merge with our working file

use "XXX\moe_clean.enrolment.dta" 
sort moe_enr_last_school_nbr
keep moe_enr_year_nbr moe_enr_last_school_nbr moe_enr_last_school_region_code moe_enr_last_school_decile_code
sort moe_enr_last_school_nbr
by moe_enr_last_school_nbr: tab moe_enr_year_nbr moe_enr_last_school_decile_code
rename moe_enr_last_school_decile_code school_decile
label variable school_decile "moe_enr_last_school_decile_code"
rename moe_enr_last_school_region_code school_region
label variable school_region "moe_enr_last_school_region_code"
rename moe_enr_last_school_nbr schoolno
label variable schoolno "moe_enr_last_school_nbr"
rename moe_enr_year_nbr year
label variable year "moe_enr_year_nbr"

* checking for any missing values for 'year' variable (so we don't drop any important observations because of this).
tab year
* no missing values so can proceed.

* following is to keep the latest value of 'school_decile' and 'school region' for every 'schoolno' based on the latest 'year'.
sort schoolno
bysort schoolno: egen year_x=max( year )
keep if (year==year_x)

drop year_x

* checking for any duplicate values of 'school_decile' and 'school_region' for same schoolno.
sort schoolno
bysort schoolno: egen decile_x=max(school_decile)
tab decile_x school_decile
drop decile_x

sort schoolno
bysort schoolno: egen reg_x=max(school_region)
tab reg_x school_region
drop reg_x

* dropping duplicate values for 'schoolno'
sort schoolno 
quietly by schoolno : gen dup = cond(_N==1,0,_n)
keep if dup==0 | dup==1
drop dup

* cleaning the 'school_decile' variable to make it more recent.
tab year
tab year school_decile
replace school_decile=. if year==2004 |  year==2006 | year==2007 | year==2009 | year==2010 | year==2011 | year==2012 | year==2013 | year==2014 | year==2015

drop in **last observation** 
sort schoolno
